Search Results: "John Goerzen"

28 April 2021

Russell Coker: Links April 2021

Dr Justin Lehmiller s blog post comparing his official (academic style) and real biographies is interesting [1]. Also the rest of his blog is interesting too, he works at the Kinsey Institute so you know he s good. Media Matters has an interesting article on the spread of vaccine misinformation on Instagram [2]. John Goerzen wrote a long post summarising some of the many ways of having a decentralised Internet [3]. One problem he didn t address is how to choose between them, I could spend months of work to setup a fraction of those services. Erasmo Acosta wrote an interesting medium article Could Something as Pedestrian as the Mitochondria Unlock the Mystery of the Great Silence? [4]. I don t know enough about biology to determine how plausible this is. But it is a worry, I hope that humans will meet extra-terrestrial intelligences at some future time. Meredith Haggerty wrote an insightful Medium article about the love vs money aspects of romantic comedies [5]. Changes in viewer demographics would be one factor that makes lead actors in romantic movies significantly less wealthy in recent times. Informative article about ZIP compression and the history of compression in general [6]. Vice has an insightful article about one way of taking over SMS access of phones without affecting voice call or data access [7]. With this method the victom won t notice that they are having their sservice interfered with until it s way too late. They also explain the chain of problems in the US telecommunications industry that led to this. I wonder what s happening in this regard in other parts of the world. The clown code of ethics (8 Commandments) is interesting [8]. Sam Hartman wrote an insightful blog post about the problems with RMS and how to deal with him [9]. Also Sam Whitton has an interesting take on this [10]. Another insightful post is by Selam G about RMS long history of bad behavior and the way universities are run [11]. Cory Doctorow wrote an insightful article for Locus about free markets with a focus on DRM on audio books [12]. We need legislative changes to fix this!

22 February 2021

John Goerzen: Recovering Our Lost Free Will Online: Tools and Techniques That Are Available Now

As I ve been thinking and writing about privacy and decentralization lately, I had a conversation with a colleague this week, and he commented about how loss of privacy is related to loss of agency: that is, loss of our ability to make our own choices, pursue our own interests, and be master of our own attention. In terms of telecommunications, we have never really been free, though in terms of Internet and its predecessors, there have been times where we had a lot more choice. Many are too young to remember this, and for others, that era is a distant memory. The irony is that our present moment is one of enormous consolidation of power, and yet also one of a proliferation of technologies that let us wrest back some of that power. In this post, I hope to enlighten or remind us of some of the choices we have lost and also talk about the ways in which we can choose to regain them, already, right now. I will talk about the possibilities, the big dreams that are possible now, and then go into more detail about the solutions. The Problems & Possibilities The limitations of online We make the assumption that we must be online to exchange data. This is reinforced by many modern protocols; Twitter clients, for instance, don t tend to let you make posts by relaying them through disconnected devices. What would it be like if you could fully participate in global communities without a constant Internet connection? If you could share photos with your friends, read the news, read your email, etc. even if you don t have a connection at present? Even if the device you use to do that never has a connection, but can route messages via other devices that do? Would it surprise you to learn that this was once the case? Back in the days of UUCP, much email and Usenet news a global discussion forum that didn t require an Internet connection was relayed via occasional calls over phone lines. This technology remains with us, and has even improved. Sadly, many modern protocols make no effort in this regard. Some email clients will let you compose messages offline to send when you get online later, but the assumption always is that you will be connected to an IP network again soon. NNCP, on the other hand, lets you relay messages over TCP, a radio, a satellite, or a USB stick. Email and Usenet, since they were designed in an era where store-and-forward was valued, can actually still be used in an entirely offline fashion (without ever touching an IP-based network). All it takes is for someone to care to make it happen. You can even still do it over UUCP if you like. The physical and data link layers Many of us just accept that we communicate in a few ways: Wifi for short distances, and then cable modems or DSL for our local Internet connection, and then many people are fuzzy about what happens after that. Or, alternatively, we have 4G phones that are the local Internet connection, and the same fuzzy things happen after. Think about this for a moment. Which of these do you control in any way? Sometimes just wifi, sometimes maybe you have choices of local Internet providers. After that, your traffic is handled by enormous infrastructure companies. There is choice here. People in ham radio have been communicating digitally over long distances without the support of the traditional Internet for decades, but the technology to do this is now more accessible to anyone. Long-distance radio has had tremendous innovation in the last decade; cheap radios can now communicate over several miles/km without any other infrastructure at all. We all carry around radios (Wifi and Bluetooth) in our pockets that don t have to be used as mere access points to the Internet or as drivers of headphones, but can also form their own networks directly (Briar). Meshtastic is an example; it s an instant messenger that can form a mesh over many miles/km and requires no IP infrastructure at all. Briar is similar. XBee radios form a mesh in hardware, allowing peers to reach each other (also over many miles/km) with a serial or framed protocol. Loss of peer-to-peer Back in the late 90s, I worked at a university. I had a 386 on my desk for a workstation not a powerful computer even then. But I put the boa webserver on it and could just serve pages on the Internet. I didn t have to get permission. Didn t have to pay a hosting provider. I could just DO it. And of course that is because the university had no firewall and no NAT. Every PC at the university was a full participant on the Internet as much as the servers at Microsoft or DEC. All I needed was a DNS entry. I could run my own SMTP server if I wanted, run a web or Gopher server, and that was that. There are many reasons why this changed. Nowadays most residential ISPs will block SMTP for their customers, and if they didn t, others would; large email providers have decided not to federate with IPs in residential address spaces. Most people have difficulty even getting a static IP address in the first place. Many are behind firewalls, NATs, or both, meaning that incoming connections of any kind are problematic. Do you see what that means? It has weakened the whole point of the Internet being a network of peers. While IP still acts that way, as a practical matter, there are clients that are prevented from being servers by administrative policy they have no control over. Imagine if you, a person with an Internet connection to your laptop or phone, could just decide to host a website, or a forum on it. For moderate levels of load, they are certainly capable of this. The only thing in the way is the network management policies you can t control. Elaborate technologies exist to try to bridge this divide, and some, like Tor or cjdns, can work quite well. More on this below. Expense of running something popular Related to the loss of peer-to-peer infrastructure is the very high cost of hosting something popular. Do you want to share videos with lots of people? That almost certainly is going to require expensive equipment and bandwidth. There is a reason that there are only a small handful of popular video streaming sites online. It requires a ton of money to host videos at scale. What if it didn t? What if you could achieve economies of scale so much that you, an individual, could compete with the likes of YouTube? You wouldn t necessarily have to run ads to support the service. You wouldn t have to have billions of dollars or billions of viewers just to make it work. This technology exists right now. Of course many of you are aware of how Bittorrent leverages the swarm for files. But projects like IPFS, Dat, and Peertube have taken this many steps further to integrate it into a global ecosystem. And, at least in the case of Peertube, this is a thing that works right now in any browser already! Application-level walled gardens I was recently startled at how much excitement there was when Github introduced dark mode . Yes, Github now offers two colors on its interface. Already back in the 80s and 90s, many DOS programs had more options than that. Git is a decentralized protocol, but Github has managed to make it centralized. Email is a decentralized protocol pick your own provider, and they all communicate but Facebook and Twitter aren t. You can t just pick your provider for Facebook. It s Facebook or nothing. There is a profit motive in locking others out; these networks want to keep you using their platforms because their real customers are advertisers, and they want to keep showing you ads. Is it possible to have a world where you get to pick your own app for sharing photos, and it works even if your parents use a different one? Yes, yes it is. Mastodon and the Fediverse are fantastic examples for social media. Pixelfed is specifically designed for photos, Mastodon for short-form communication, there s Pleroma for more long-form communication, and they all work together. You can use Mastodon to read Pleroma content or look at Pixelfed photos, and there are many (free) providers of each. Freedom from manipulation I recently wrote about the dangers of the attention economy, so I won t go into a lot of detail here. Fundamentally, you are not the customer of Facebook or Google; advertisers are. They optimize their site to keep you on it as much as possible so that they can show you as many ads as possible which makes them as much money as possible. Ads, of course, are fundamentally seeking to manipulate your behavior ( buy this product ). By lowering the cost of running services, we can give a huge boost to hobbyists and nonprofits that want to do so without an ultimate profit motive. For-profit companies benefit also, with a dramatically reduced cost structure that frees them to pursue their mission instead of so many ads. Freedom from snooping (privacy and anonymity) These days, it s not just government snooping that people think about. It s data stolen by malware, spies at corporations (whether human or algorithmic), and even things like basic privacy of one s own security footage. Here the picture is improving; encryption in transit, at least at a basic level, has become much more common with TLS being a standard these days. Sadly, end-to-end encryption (E2EE) is not nearly as much, perhaps because corporations have a profit motive to have access to your plaintext and metadata. Closely related to privacy is anonymity: that is, being able to do things in an anonymous fashion. The two are not necessarily equal: you could send an encrypted message but reveal who the correspondents are, as with email; or, you could send a plaintext message over a Tor exit node that hides who the correspondents are. It is sometimes difficult to achieve both. Nevertheless, numerous answers exist here that tackle one or both problems, from the Signal messenger to Tor. Solutions That Exist Today Let s dive in to some of the things that exist today. One concept you ll see in many of these is integrated encryption with public keys used for addressing. In other words, your public key is akin to an IP address (and in some cases, is literally your IP address.) Data link and networking technologies (some including P2P) P2P Infrastructure While some of the technologies above, such as cjdns, explicitly facitilitate peer-to-peer communication, there are some other application-level technologies to look at. Instant Messengers and Chat I won t go into a lot of detail here since I recently wrote a roundup of secure mesh messengers and also a followup article about Signal and some hidden drawbacks of P2P. Please refer to those articles for some interesting things that are happening in this space. Matrix is a distributed IM platform similar in concept to Slack or IRC, but globally distributed in a mesh. It supports optional E2EE. Social Media I wrote recently about how to join the Fediverse, which covered joining Mastodon, a federeated, decentralized social network. Mastodon is the largest of these, with several million users, and is something of a much nicer version of Twitter. Mastodon is also part of what is known as the Fediverse , which are applications that are loosely joined together by their support of the ActivityPub protocol. Other popular Fediverse applications include Pixelfed (similar to Instagram) and Peertube for sharing video. Peertube is particularly interesting in that it supports Webtorrent for efficiently distributing popular videos. Webtorrent is akin to Bittorrent running efficiently inside your browser. Concluding Remarks Part of my goal with this is encouraging people to dream big, to ask questions like: What could you do if offline were easy? What is possible if you have freedom in the physical and data link layers? Dream big. We re so used to thinking that it s quite difficult for two devices on the Internet to talk to each other. What would be possible if this were actually quite easy? The assumption that costs rise dramatically as popularity increases is also baked into our thought processes. What if that weren t the case could you take on Youtube from your garage? Would lowering barriers to entry lower the ad economy and let nonprofits have more equal footing with large corporations? We have so many walled gardens, from Github to Facebook, that we almost forget it doesn t have to be that way. So having asked these questions, my secondary point is to suggest that these aren t pie-in-the-sky notions. These possibilites are with us right now. You ll notice from this list that virtually every one of these technologies is ad-free at its heart (though some would be capable of serving ads). They give you back your attention. Many preserve privacy, anonymity, or both. Many dramatically improve your freedom of association and communication. Technologies like IPFS and Bittorrent ease the burden of running something popular. Some are quite easy to use (Mastodon or Peertube) while others are much more complex (libp2p or the lower-level mesh network systems). Clearly there is still room for improvement in many areas. But my fundamental point is this: good technology is here, right now. Technical people can vote with their feet and wallets and start using it. Early adopters will help guide the way for the next set of improvements. Join us!

4 February 2021

John Goerzen: A Simple, Delay-Tolerant, Offline-Capable Mesh Network with Syncthing (+ optional NNCP)

A little while back, I spent a week in a remote area. It had no Internet and no cell phone coverage. Sometimes, I would drive in to town where there was a signal to get messages, upload photos, and so forth. I had to take several devices with me: my phone, my wife s, maybe a laptop or a tablet too. It seemed there should have been a better way. And there is. I ll use this example to talk about a mesh network, but it could just as well apply to people wanting to communicate on a 12-hour flight that has no in-flight wifi, or spacecraft with an intermittent connection, or a person traveling. Syncthing makes a wonderful solution for things like these. Here are some interesting things about Syncthing: Syncthing works by having you define devices and folders. You can choose which devices to share folders with. A shared folder has an ID that is unique across Sycnthing. You can share a folder from device A to device B, and then device B can share it with device C, even if A and C don t know about each other or have no way to communicate. More commonly, though, all the devices would know about each other and will opportunistically communicate the best way they can. Syncthing uses something akin to a Bittorrent protocol. Say you re syncing videos from your phone, and they re going to 3 machines. It doesn t mean that Syncthing has to send it three times from the phone. Syncthing will send each block, most likely, just once; the other nodes in the swarm will register the block availability from the first other node to get it and will exchange blocks with themselves. Syncthing will typically look for devices on the local LAN. Failing that, it will use an introduction server to see if it can reach them directly using P2P. Failing that, perhaps due to restrictive firewalls or NAT, communication can be relayed through volunteer-run Syncthing servers on the Internet. All Syncthing communications are cryptographically encrypted and verified. You can also configure Syncthing arbitrarily; for instance, to run over ssh or Tor tunnels. So, let s look at how Syncthing might help with the example I laid out up front. All the devices at the remote location could communicate with each other. The Android app is quite capable of syncing photos and videos using Syncthing, for instance. Then one device could be taken to the Internet location and it would transmit data on behalf of all the others perhaps back to a computer at your home, or to a server somewhere. Perhaps a script running on the remote server would then move files out of the syncthing synced folder into permanent storage elsewhere, triggering a deletion to be sent to the phone to free up storage. When the phone gets back to the other devices, the deletion can be propagated to them to free up storage there too. Or maybe you have a computer out in a shed or somewhere without Internet access that you go to periodically, and need to get files to it. Again, your phone could be a carrier. Taking it a step further If you envision a file as a packet, you could, conceivably, do something like tunnel TCP/IP over Syncthing, assuming generous-enough timeouts. It can truly handle communication. But you don t need TCP/IP for this. Consider some other things you could do: You can start to see how there are a lot of possibilities here that extend beyond just file synchronization, though they are built upon a file synchronization tool. Enter NNCP Let s look at a tool that s especially suited for this: NNCP, which I ve been writing about a lot lately. NNCP is designed to handle file exchange and remote execution with remote computers in an asynchronous, store-and-forward manner. NNCP packets are themselves encrypted and authenticated. NNCP traditionally is source-routed (that is, you configure it so that machine A reaches machine D by relaying through B and C), and the packets are onion-routed. NNCP packets can be exchanged by a TCP call, a tar-like stream, copying files to something like a USB stick and physically transporting it to the remote, etc. This works really well and I ve been using it myself. But it gets complicated if the network topology isn t fixed; it is difficult to reroute packets due to the onion routing, for instance. There are various workarounds that could be used but why not just use Syncthing as a transport in those cases? nncp-xfer is the command that exchanges packets by writing them to, and reading them from, a directory. It is what you d use to exchange packets on a USB stick. And what you d use to exchange packets via Syncthing. It writes packets in a RECIPIENT/SENDER/PACKET directory structure, so it is perfectly fine to have multiple systems exchanging packets in a single Syncthing synced folder tree. This structure also allows leaf nodes to only carry the particular packets they re interested in. The packets are all encrypted, so they can be freely synced wherever. Since Syncthing opportunistically syncs a shared folder with any device the folder is shared with, a phone could very easily be the NNCP transport, even if it has no idea what NNCP is. It could carry NNCP packets back and forth between sites, or to the Internet, or whatever. NNCP supports file transmission, file request, and remote execution, all subject to controls, of course. It is easy to integrate with Exim or Postfix to use as a mail transport, Git transport, and so forth. I use it for backups. It would be quite easy to have it send those backups (encrypted zfs send) via nncp-xfer to Syncthing instead of the usual method, and then if I ve shared the Syncthing folder with my phone, all I need to do is bring the phone into Internet range and they get sent. nncp-xfer will normally remove the packets out of the xfer directory as it ingests them, so the space will only be consumed on the phone (and laptop) until we know the packets made it to their destination. Pretty slick, eh?

31 January 2021

John Goerzen: The Hidden Drawbacks of P2P (And a Defense of Signal)

Not long ago, I posted a roundup of secure messengers with off-the-grid capabilities. Some conversation followed, which led me to consider some of the problems with P2P protocols. P2P and Privacy Brave adopting IPFS has driven a lot of buzz lately. IPFS is essentially a decentralized, distributed web. This concept has a lot of promise. But take a look at the IPFS privacy document. Some things to highlight: So in this case, you have traded giving information about what you request to specific sites to giving it to potentially hundreds of untrusted peers, some of which may be logging this for nefarious purposes. Worse, you have a durable PeerID that can be used for tracking and tied to your IP address a data collector s dream. This PeerID, combined with DHT requests and the CIDs (Content ID) of the things you host (implying you viewed them in the past), can be used to establish a picture of what you are requesting now and requested recently. Similar can be said from everything like Scuttlebutt to GNU Jami; any service that operates on a P2P basis will likely reveal your IP, and tie your identity to it (and your IP address history). In some cases, as with Jami, this would be limited to friends you add; in others, as with Scuttlebutt and IPFS, it could be revealed to anyone. The advantages of P2P are undeniable and profound, but few are effectively addressing the privacy implications. The one I know of that is, Briar, routes all traffic over Tor; every node is reached by a Tor onion service. Federation: somewhat better In a federated model, every client connects to a server, and there are many servers participating in a federation with each other. Matrix and Mastodon are examples of a federated model. In this scenario, only one server your own homeserver can track you by IP. End-to-end encryption is certainly possible in a federated model, and Matrix supports it. This does give a third party (the specific server you use) knowledge of your IP, but that knowledge can be significantly limited. A downside of this approach is that if your particular homeserver is down, you are unable to communicate. Truly decentralized P2P solutions don t have that problem thought they do have a related one, which is that clients communicating with each other must both be online simultaneously in order for messages to be transmitted, and this can be a real challenge for mobile devices. Centralization and Signal Signal is centralized; it has one central server farm, and if it is down, you can t communicate or choose any other server, either. We saw it go down recently after Elon Musk mentioned it. Still, I recommend Signal for the general public. Here s why. Signal brings encryption and privacy to meet people where they re at, not the other way around. People don t have to choose a server, it can automatically recognize contacts that use Signal, it has emojis, attachments, secure voice and video calling, and (aside from the Musk incident), it all just works. It feels like, and is, a polished, modern experience with the bells and whistles people are used to. I m a huge fan of Matrix (aka Element) and even run my own instance. It has huge promise. But it is Not. There. Yet. Why do I saw this about Matrix? Again, I love MAtrix. I use it every day to interact with Matrix, IRC, Slack, and Discord channels. It has a ton of promise. But would I count on it to carry a my car s broken down and I m stranded message? No. How about some of the other options out there? I mentioned Briar above. It s fantastic and its offline options are novel and promising. But in common usage, it can t deliver a message unless both devices are online simultaneously, and doesn t run on iOS (though both are being worked on). It also can t send photos or do voice or video calling. Some of these same limitations apply to most of the other Signal alternatives also. either that, or they are encryption-optional, or terribly hard to set up and use. I recently mentioned Status, which shows a ton of promise, but has no voice or video calling capabilities. Scuttlebutt is a fantastic protocol with extremely difficult onboarding (lengthy process, error-prone finding a pub, multi-GB initial download, etc.) And many of these leak IP addresses as discussed above. So Signal gives people: If you are going to tell someone, it s so EASY to get your texts away from Facebook and AT&T , then Signal is the thing you ve got to point them to. It may not be in two years, but for now, it is. Do not let the perfect be the enemy of the good. It advances the status quo without harming usability, which nothing else does yet. I am aware of all of the very legitimate criticisms of Signal. They are real and they are why I am excited that there are so many alternatives with promise, some of which I use actively. Let us technical people use, debug, contribute to, and evangelize the alternatives. And while we re doing that, tell Grandma to contact us on Signal.

19 January 2021

John Goerzen: Roundup of Secure Messengers with Off-The-Grid Capabilities (Distributed/Mesh Messengers)

Amid all the conversation about Signal, and the debate over decentralization, one thing has often not been raised: all of these things require an Internet connection. Of course, you might say. Internet is everywhere these days. Well, not so much, and it turns out there are some very good reasons that people might want messengers that work offline. Here are some examples: How do they work? These all use some form of local radio signal. Some, such as Briar, may use short-range Bluetooth and Wifi, while others use radios such as LoRa that can reach several miles with low power. I ve written quite a bit about LoRa before, and its unique low-speed but extreme-distance radio capabilities even on low power. One common thread through these is that most of them are Android-only, though many are compatible with F-Droid and privacy-enhanced Android distributions. Every item on this list uses full end-to-end encryption (E2EE). Let s dive on in. Briar Of all the options mentioned here, Briar is the one that bridges the traditional Internet-based approach with alternative options the best. It offers three ways for distributing data: As far as I can tell, there is no centralized server in Briar at all. Your account , such as it is, lives entirely within your device; if you wipe your device, you will have to make a new account and re-establish contacts. The use of Tor is also neat to see; it ensures that an adversary can t tell, just from that, that you re using Briar at all, though of course timing analysis may still be possible (and Bluetooth and Wifi uses may reval some of who is communicating). Briar features several types of messages (detailed in the manual), which really are just different spins on communication, which they liken to metaphors people are familiar with: By default, Briar raises an audible notification for incoming messages of all types. This is configurable for each type. Blogs have a way to reblog (even a built-in RSS reader to facilitate that), but framed a different way, they are broadcast messages. They could, for instance, be useful for a send help message to everyone (assuming that people haven t all shut off notifications of blogs due to others using them different ways). Briar s how it works page has an illustration specifically of how blogs are distributed. I m unclear on some of the details, and to what extent this applies to other kinds of messages, but one thing that you can notice from this is that a person A could write a broadcast message without Internet access, person B could receive it via Bluetooth or whatever, and then when person B gets Internet access again, the post could be distributed more widely. However, it doesn t appear that Briar is really a full mesh, since only known contacts in the distribution path for the message would repeat it. There are some downsides to Briar. One is that, since an account is fully localized to a device, one must have a separate account for each device. That can lead to contacts having to pick a specific device to send a message to. There is an online indicator, which may help, but it s definitely not the kind of seamless experience you get from Internet-only messengers. Also, it doesn t support migrating to a new phone, live voice/video calls, or attachments, but attachments are in the works. All in all, a solid communicator, and is the only one on this list that works 100% with the hardware everyone already has. While Bluetooth and Wifi have far more limited range than the other entries, there is undeniably convenience in not needing any additional hardware, and it may be particularly helpful when extra bags/pockets aren t available. Also, Briar is fully Open Source. Meshtastic Meshtastic is a radio-first LoRa mesh project. What do I mean by radio-first? Well, basically cell phones are how you interact with Meshtastic, but they are optional. The hardware costs about $30 and the batteries last about 8 days. Range between nodes is a few miles in typical conditions (up to 11km / 7mi in ideal conditions), but nodes act as repeaters, so it is quite conceivable to just drop a node in the middle if you and contacts will be far apart. The project estimates that around 2000 nodes are in operation, and the network is stronger the more nodes are around. The getting started site describes how to build one. Most Meshtastic device builds have a screen and some buttons. They can be used independently from the Android app to display received messages, distance and bearing to other devices (assuming both have a GPS enabled), etc. This video is an introduction showing it off, this one goes over the hardware buttons. So even if your phone is dead, you can at least know where your friends are. Incidentally, the phone links up to the radio board using Bluetooth, and can provide a location source if you didn t include one in your build. There are ideas about solar power for Meshtastic devices, too. Meshtastic doesn t, as far as I know, have an option for routing communication over the Internet, but the devices appear to be very thoughtfully-engineered and easy enough to put together. This one is definitely on my list to try. Ripple-based devices This is based on the LoRa Mesh Radio Instructables project, and is similar in concept to Meshtastic. It uses similar hardware, a similar app, but also has an option with a QWERTY hardware keyboard available, for those that want completely phone-free operation while still being able to send messages. There are a number of related projects posted at Instructables: a GPS tracker, some sensors, etc. These are variations on the same basic concept. These use the Ripple firmware, which is not open source, so I haven t pursued it further. GoTenna For people that want less of a DIY model, and don t mind proprietary solutions, there are two I ll mention. The first is GoTenna Mesh, which is LoRa-based and sells units for $90 each. However, there are significant community concerns about the longevity of the project, as GoTenna has re-focused on government and corporate work. The Android app hasn t been updated in 6 monnths despite a number of reviews citing issues, and the iOS app is also crusty. Beartooth Even more expensive at $125 each is the Beartooth. Also a proprietary option, I haven t looked into it more, but they are specifically targetting backwoods types of markets. Do not use: Bridgefy Bridgefy was briefly prominent since it was used during the Hong Kong protests. However, numerous vulnerabilities have been demonstrated, and the developers have said they are re-working the app to address them. I wouldn t recommend it for now. Alternatives: GMRS handhelds In the USA, GMRS voice handhelds are widely available. Although a license is required, it is simple (no exam) and cheap ($35) and extends to a whole family. GMRS radios also interoperate with FRS radios, which require no license and share some frequencies, but are limited to lower power (though are often sufficient). Handheld GMRS radios that use up to 5W of power are readily available. A voice signal is a lot harder to carry for a long distance than a very low-bandwidth digital one, so even with much more power you will probably not get the same kind of range you will with something like Meshtastic, and they don t come with any kind of security or encryption at all. However, for basic communication, they are often a useful tool.

12 January 2021

John Goerzen: Remote Directory Tree Comparison, Optionally Asynchronous and Airgapped

Note: this is another article in my series on asynchronous communication in Linux with UUCP and NNCP. In the previous installment on store-and-forward backups, I mentioned how easy it is to do with ZFS, and some of the tools that can be used to do it without ZFS. A lot of those tools are a bit less robust, so we need some sort of store-and-forward mechanism to verify backups. To be sure, verifying backups is good with ANY scheme, and this could be used with ZFS backups also. So let s say you have a shiny new backup scheme in place, and you d like to verify that it s working correctly. To do that, you need to compare the source directory tree on machine A with the backed-up directory tree on machine B. Assuming a conventional setup, here are some ways you might consider to do that: The first two options are not particularly practical for large datasets, though I note that the second is compatible with airgapping. Using rsync requires both systems to be online at the same time to perform the comparison. What would be really nice here is a tool that would write out lots of information about the files on a system: their names, sizes, last modified dates, maybe even sha256sum and other data. This file would be far smaller than the directory tree itself, would compress nicely, and could be easily shipped to an airgapped system via NNCP, UUCP, a USB drive, or something similar. Tool choices It turns out there are already quite a few tools in Debian (and other Free operating systems) to do this, and half of them are named mtree (though, of course, not all mtrees are compatible with each other.) We ll look at some of the options here. I ve made a simple test directory for illustration purposes with these commands:
mkdir test
cd test
echo hi > hi
ln -s hi there
ln hi foo
touch empty
mkdir emptydir
mkdir somethingdir
cd somethingdir
ln -s ../there
I then also used touch to set all files to a consistent timestamp for illustration purposes. Tool option: getfacl (Debian package: acl) This comes with the acl package, but can be used with other than ACL purposes. Unfortunately, it doesn t come with a tool to directly compare its output with a filesystem (setfacl, for instance, can apply the permissions listed but won t compare.) It ignores symlinks and doesn t show sizes or dates, so is ineffective for our purposes. Example output:
$ getfacl --numeric -R test
...
# file: test/hi
# owner: 1000
# group: 1000
user::rw-
group::r--
other::r--
...
Tool option: fmtree, the FreeBSD mtree (Debian package: freebsd-buildutils) fmtree can prepare a specification based on a directory tree, and compare a directory tree to that specification. The comparison also is aware of files that exist in a directory tree but not in the specification. The specification format is a bit on the odd side, but works well enough with fmtree. Here s a sample output with defaults:
$ fmtree -c -p test
...
# .
/set type=file uid=1000 gid=1000 mode=0644 nlink=1
.               type=dir mode=0755 nlink=4 time=1610421833.000000000
    empty       size=0 time=1610421833.000000000
    foo         nlink=2 size=3 time=1610421833.000000000
    hi          nlink=2 size=3 time=1610421833.000000000
    there       type=link mode=0777 time=1610421833.000000000 link=hi
... skipping ...
# ./somethingdir
/set type=file uid=1000 gid=1000 mode=0777 nlink=1
somethingdir    type=dir mode=0755 nlink=2 time=1610421833.000000000
    there       type=link time=1610421833.000000000 link=../there
# ./somethingdir
..
..
You might be wondering here what it does about special characters, and the answer is that it has octal escapes, so it is 8-bit clean. To compare, you can save the output of fmtree to a file, then run like this:
cd test
fmtree < ../test.fmtree
If there is no output, then the trees are identical. Change something and you get a line of of output explaining each difference. You can also use fmtree -U to change things like modification dates to match the specification. fmtree also supports quite a few optional keywords you can add with -K. They include things like file flags, user/group names, various tipes of hashes, and so forth. I'll note that none of the options can let you determine which files are hardlinked together. Here's an excerpt with -K sha256digest added:
    empty       size=0 time=1610421833.000000000 \
                sha256digest=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    foo         nlink=2 size=3 time=1610421833.000000000 \
                sha256digest=98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
If you include a sha256digest in the spec, then when you verify it with fmtree, the verification will also include the sha256digest. Obviously fmtree -U can't correct a mismatch there, but of course it will detect and report it. Tool option: mtree, the NetBSD mtree (Debian package: mtree-netbsd) mtree produces (by default) output very similar to fmtree. With minor differences (such as the name of the sha256digest in the output), the discussion above about fmtree also applies to mtree. There are some differences, and the most notable is that mtree adds a -C option which reads a spec and converts it to a "format that's easier to parse with various tools." Here's an example:
$ mtree -c -K sha256digest -p test   mtree -C
. type=dir uid=1000 gid=1000 mode=0755 nlink=4 time=1610421833.0 flags=none 
./empty type=file uid=1000 gid=1000 mode=0644 nlink=1 size=0 time=1610421833.0 flags=none 
./foo type=file uid=1000 gid=1000 mode=0644 nlink=2 size=3 time=1610421833.0 flags=none 
./hi type=file uid=1000 gid=1000 mode=0644 nlink=2 size=3 time=1610421833.0 flags=none 
./there type=link uid=1000 gid=1000 mode=0777 nlink=1 link=hi time=1610421833.0 flags=none 
./emptydir type=dir uid=1000 gid=1000 mode=0755 nlink=2 time=1610421833.0 flags=none 
./somethingdir type=dir uid=1000 gid=1000 mode=0755 nlink=2 time=1610421833.0 flags=none 
./somethingdir/there type=link uid=1000 gid=1000 mode=0777 nlink=1 link=../there time=1610421833.0 flags=none 
Most definitely an improvement in both space and convenience, while still retaining the relevant information. Note that if you want the sha256digest in the formatted output, you need to pass the -K to both mtree invocations. I could have done that here, but it is easier to read without it. mtree can verify a specification in either format. Given what I'm about to show you about bsdtar, this should illustrate why I bothered to package mtree-netbsd for Debian. Unlike fmtree, the mtree -U command will not adjust modification times based on the spec, but it will report on differences. Tool option: bsdtar (Debian package: libarchive-tools) bsdtar is a fascinating program that can work with many formats other than just tar files. Among the formats it supports is is the NetBSD mtree "pleasant" format (mtree -C compatible). bsdtar can also convert between the formats it supports. So, put this together: bsdtar can convert a tar file to an mtree specification without extracting the tar file. bsdtar can also use an mtree specification to override the permissions on files going into tar -c, so it is a way to prepare a tar file with things owned by root without resorting to tools like fakeroot. Let's look at how this can work:
$ cd test
$ bsdtar --numeric -cf - --format=mtree .

. time=1610472086.318593729 mode=755 gid=1000 uid=1000 type=dir
./empty time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=0
./foo nlink=2 time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=3
./hi nlink=2 time=1610421833.0 mode=644 gid=1000 uid=1000 type=file size=3
./ormat\075mtree time=1610472086.318593729 mode=644 gid=1000 uid=1000 type=file size=5632
./there time=1610421833.0 mode=777 gid=1000 uid=1000 type=link link=hi
./emptydir time=1610421833.0 mode=755 gid=1000 uid=1000 type=dir
./somethingdir time=1610421833.0 mode=755 gid=1000 uid=1000 type=dir
./somethingdir/there time=1610421833.0 mode=777 gid=1000 uid=1000 type=link link=../there
You can use mtree -U to verify that as before. With the --options mtree: set, you can also add hashes and similar to the bsdtar output. Since bsdtar can use input from tar, pax, cpio, zip, iso9660, 7z, etc., this capability can be used to create verification of the files inside quite a few different formats. You can convert with bsdtar -cf output.mtree --format=mtree @input.tar. There are some foibles with directly using these converted files with mtree -U, but usually minor changes will get it there. Side mention: stat(1) (Debian package: coreutils) This tool isn't included because it won't operate recursively, but is a tool in the similar toolbox. Putting It Together I will still be developing a complete non-ZFS backup system for NNCP (or UUCP) in a future post. But in the meantime, here are some ideas you can reflect on: I will further develop at least one of these ideas in a future post. Bonus: cross-tool comparisons In my mtree-netbsd packaging, I added tests like this to compare between tools:
fmtree -c -K $(MTREE_KEYWORDS)   mtree
mtree -c -K $(MTREE_KEYWORDS)   sed -e 's/\(md5\ sha1\ sha256\ sha384\ sha512\)=/\1digest=/' -e 's/rmd160=/ripemd160digest=/'   fmtree
bsdtar -cf - --options 'mtree:uname,gname,md5,sha1,sha256,sha384,sha512,device,flags,gid,link,mode,nlink,size,time,uid,type,uname' --format mtree .   mtree

John Goerzen: The Good, Bad, and Scary of the Banning of Donald Trump, and How Decentralization Makes It All Better

It is undeniable that banning Donald Trump from Facebook, Twitter, and similar sites is a benefit for the moment. It may well save lives, perhaps lots of lives. But it raises quite a few troubling issues. First, as EFF points out, these platforms have privileged speakers with power, especially politicians, over regular users. For years now, it has been obvious to everyone that Donald Trump has been violating policies on both platforms, and yet they did little or nothing about it. The result we saw last week was entirely forseeable and indeed, WAS forseen, including by elements in those companies themselves. (ACLU also raises some good points) Contrast that with how others get treated. Facebook, two days after the coup attempt, banned Benjamin Wittes, apparently because he mentioned an Atlantic article opposed to nutcase conspiracy theories. The EFF has also documented many more egregious examples: taking down documentation of war crimes, childbirth images, black activists showing the racist messages they received, women discussing online harassment, etc. The list goes on; YouTube, for instance, has often been promoting far-right violent videos while removing peaceful LGBTQ ones. In short, have we simply achieved legal censorship by outsourcing it to dominant corporations? It is worth pausing at this point to recognize two important princples: First, that we do not see it as right to compel speech. Secondly, that there exist communications channels and other services that nobody is calling on to suspend Donald Trump. Let s dive into those a little bit. There have been no prominent calls for AT&T, Verizon, Gmail, or whomever provides Trump and his campaign with cell phones or email to suspend their service to him. Moreover, the gas stations that fuel his vehicles and the airports that service his plane continue to provide those services, and nobody has seriously questioned that, either. Even his Apple phone that he uses to post to Twitter remains, as far as I know, fully active. Secondly, imagine you were starting up a small web forum focused on raising tomato plants. It is, and should be, well within your rights to keep tomato-haters out, as well as people that have no interest in tomatoes but would rather talk about rutabagas, politics, or Mars. If you are going to host a forum about tomatoes, you have the right to keep it a forum about tomatoes; you cannot be forced to distribute someone else s speech. Likewise in traditional media, a newspaper cannot be forced to print every letter to the editor in full. In law, there is a notion of a common carrier, that provides services to the general public without discrimination. Phone companies and ISPs fall under this. Facebook, Twitter, and tomato sites don t. But consider what happens if Facebook bans you. You might be using Facebook-owned Whatsapp to communicate with family and friends, and suddenly find yourself unable to ask someone to pick you up. Or your treasured family photos might be in Facebook-owned Instagram, lost forever. It s not just Facebook; similar things happen with Google, locking people out of their phones and laptops, their emails, even their photos. Is it right that Facebook and Google aren t regulated as common carriers? Perhaps, or perhaps we need some line of demarcation between their speech-to-the-public services (Facebook timeline posts, YouTube) and private communication (Whatsapp, Gmail). It s a thorny issue; should government be regulating speech instead? That s also fraught. So is corporate control. Decentralization Helps Dramatically With email, you get to pick your email provider (yes, there are two or three big ones, but still plenty of others). Each email provider will have its own set of things it considers acceptable, and its own set of other servers and accounts it s willing to exchange mail with. (It is extremely common for mail providers to choose not to accept mail from various other mail servers based on ISP, IP address, reputation, and so forth.) What if we could do something like that for Twitter and Facebook? Let you join whatever instance you like. Maybe one instance is all about art and they don t talk about politics. Or another is all about Free Software and they don t have advertising. And then there are plenty of open instances that accept anything that s respectful. And, like email, people of one server can interact with those using another just as easily as if they were using the same one. Well, this isn t hypothetical; it already exists in the Fediverse. The most common option is Mastodon, and it so happens that a month ago I wrote about its benefits for other reasons, and included some links on getting started. There is no reason that we must all let our online speech be controlled by companies with a profit motive to keep hate speech on their platforms. There is no reason that we must all have a single set of rules, or accept strong corporate or government control, either. The quality of conversation on Mastodon is far higher than either Twitter or Facebook; decentralization works and it s here today.

7 January 2021

John Goerzen: This Is How Tyrants Go: Alone

I remember reading an essay a month or so ago sadly I forget where talking about how things end for tyrants. If I were to sum it up, it would be with the word alone. Their power fading, they find that they had few true friends or believers; just others that were greedy for power or riches and, finding those no longer to be had, depart the sinking ship. The article looked back at examples like Nixon and examples from the 20th century in Europe and around the world. Today we saw images of a failed coup attempt. But we also saw hope. Already senior staff in the White House are resigning. Ones that had been ardent supporters. In the end, just 6 senators supported the objection to the legitimate electors. Six. Lindsay Graham, Mike Pence, and Mitch McConnel all deserted Trump. CNN reports that there are serious conversations about invoking the 25th amendment and removing him from office, because even Republicans are to the point of believing that America should not have two more weeks of this man. Whether those efforts are successful or not, I don t know. What I do know is that these actions have awakened many people, in a way that nothing else could for four years, to the dangers of Trump and, in the end, have bolstered the cause of democracy. Hard work will remain but today, Donald Trump is in the White House alone, abandoned by allies and blocked by Twitter. And we know that within two weeks, he won t be there at all. We will get through this.

4 January 2021

John Goerzen: More Topics on Store-And-Forward (Possibly Airgapped) ZFS and Non-ZFS Backups with NNCP

Note: this is another article in my series on asynchronous communication in Linux with UUCP and NNCP. In my previous post, I introduced a way to use ZFS backups over NNCP. In this post, I ll expand on that and also explore non-ZFS backups. Use of nncp-file instead of nncp-exec The previous example used nncp-exec (like UUCP s uux), which lets you pipe stdin in, then queues up a request to run a given command with that input on a remote. I discussed that NNCP doesn t guarantee order of execution, but that for the ZFS use case, that was fine since zfs receive would just fail (causing NNCP to try again later). At present, nncp-exec stores the data piped to it in RAM before generating the outbound packet (the author plans to fix this shortly). That made it unusable for some of my backups, so I set it up another way: with nncp-file, the tool to transfer files to a remote machine. A cron job then picks them up and processes them. On the machine being backed up, we have to find a way to encode the dataset to be received. I chose to do that as part of the filename, so the updated simplesnap-queue could look like this:
#!/bin/bash
set -e
set -o pipefail
DEST=" echo $1   sed 's,^tank/simplesnap/,,' "
FILE="bakfsfmt2- date "+%s.%N".$$ _ echo "$DEST"   sed 's,/,@,g' "
echo "Processing $DEST to $FILE" >&2
# stdin piped to this
zstd -8 - \
    gpg --compress-algo none --cipher-algo AES256 -e -r 012345...  \
    su nncp -c "/usr/local/nncp/bin/nncp-file -nice B -noprogress - 'backupsvr:$FILE'" >&2
echo "Queued $DEST to $FILE" >&2
I ve added compression and encryption here as well; more on that below. On the backup server, we would define a different incoming directory for each node in nncp.hjson. For instance:
host1:  
...
   incoming: "/var/local/nncp-bakcups-incoming/host1"
 
host2:  
...
   incoming: "/var/local/nncp-backups-incoming/host2"
 
I ll present the scanning script in a bit. Offsite Backup Rotation Most of the time, you don t want just a single drive to store the backups. You d like to have a set. At minimum, one wouldn t be plugged in so lightning wouldn t ruin all your backups. But maybe you d store a second drive at some other location you have access to (friend s house, bank box, etc.) There are several ways you could solve this: The third option can be helped with NNCP, too. One way is to create separate NNCP installations for each of the drives that you store data on. Then, whenever one is plugged in, the appropriate NNCP config will be loaded and appropriate packets received and processed. The neighbor machine the spooler would just store up packets for the offsite drive until it comes back onsite (or, perhaps, your airgapped USB transport would do this). Then when it s back onsite, all the queued up ZFS sends get replayed and the backups replicated. Now, how might you handle this with NNCP? The simple way would be to have each system generating backups send them to two destinations. For instance:
zstd -8 -   gpg --compress-algo none --cipher-algo AES256 -e -r 07D5794CD900FAF1D30B03AC3D13151E5039C9D5 \
    tee >(su nncp -c "/usr/local/nncp/bin/nncp-file -nice B+5 -noprogress - 'backupdisk1:$FILE'") \
        >(su nncp -c "/usr/local/nncp/bin/nncp-file -nice B+5 -noprogress - 'backupdisk2:$FILE'") \
   > /dev/null
You could probably also more safely use pee(1) (from moreutils) to do this. This has an unfortunate result of doubling the network traffic from every machine being backed up. So an alternative option would be to queue the packets to the spooling machine, and run a distribution script from it; something like this, in part:
INCOMINGDIR="/var/local/nncp-bakfs-incoming"
LOCKFILE="$INCOMINGDIR/.lock"
printf -v EVAL_SAFE_LOCKFILE '%q' "$LOCKFILE"
if dotlockfile -r 0 -l -p "$ LOCKFILE "; then
  logit "Lock obtained at $ LOCKFILE  with dotlockfile"
  trap 'ECODE=$?; dotlockfile -u '"$ EVAL_SAFE_LOCKFILE "'; exit $ECODE' EXIT INT TERM
else
  logit "Could not obtain lock at $LOCKFILE; $0 likely already running."
  exit 0
fi
logit "Scanning queue directory..."
cd "$INCOMINGDIR"
for HOST in *; do
   cd "$INCOMINGDIR/$HOST"
   for FILE in bakfsfmt2-*; do
           if [ -f "$FILE" ]; then
                   for BAKFS in backupdisk1 backupdisk2; do
                           runcommand nncp-file -nice B+5 -noprogress "$FILE" "$BAKFS:$HOST/$FILE"
                   done
                   runcommand rm "$FILE"
           else
                   logit "$HOST: Skipping $FILE since it doesn't exist"
           fi
   done
done
logit "Scan complete."
Security Considerations You ll notice that in my example above, the encryption happens as the root user, but nncp is called under su. This means that even if there is a vulnerability in NNCP, the data would still be protected by GPG. I ll also note here that many sites run ssh as root unnecessarily; the same principles should apply there. (ssh has had vulnerabilities in the past as well). I could have used gpg s built-in compression, but zstd is faster and better, so we can get good performance by using fast compression and piping that to an algorithm that can use hardware acceleration for encryption. I strongly encourage considering transport, whether ssh or NNCP or UUCP, to be untrusted. Don t run it as root if you can avoid it. In my example, the nncp user, which all NNCP commands are run as, has no access to the backup data at all. So even if NNCP were compromised, my backup data wouldn t be. For even more security, I could also sign the backup stream with gpg and validate that on the receiving end. I should note, however, that this conversation assumes that a network- or USB-facing ssh or NNCP is more likely to have an exploitable vulnerability than is gpg (which here is just processing a stream). This is probably a safe assumption in general. If you believe gpg is more likely to have an exploitable vulnerability than ssh or NNCP, then obviously you wouldn t take this particular approach. On the zfs side, the use of -F with zfs receive is avoided; this could lead to a compromised backed-up machine generating a malicious rollback on the destination. Backup zpools should be imported with -R or -N to ensure that a malicious mountpoint property couldn t be used to cause an attack. I choose to use zfs receive -u -o readonly=on which is compatible with both unmounted backup datasets and zpools imported with -R (or both). To access the data in a backup dataset, you would normally clone it and access it there. The processing script So, put this all together and look at an example of a processing script that would run from cron as root and process the incoming ZFS data.
#!/bin/bash
set -e
set -o pipefail
# Log a message
logit ()  
   logger -p info -t " basename "$0" [$$]" "$1"
 
# Log an error message
logerror ()  
   logger -p err -t " basename "$0" [$$]" "$1"
 
# Log stdin with the given code.  Used normally to log stderr.
logstdin ()  
   logger -p info -t " basename "$0" [$$/$1]"
 
# Run command, logging stderr and exit code
runcommand ()  
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
 
STORE=backups/simplesnap
INCOMINGDIR=/backups/nncp/incoming
if ! [ -d "$INCOMINGDIR" ]; then
        logerror "$INCOMINGDIR doesn't exist"
        exit 0
fi
LOCKFILE="/backups/nncp/.nncp-backups-zfs-scan.lock"
printf -v EVAL_SAFE_LOCKFILE '%q' "$LOCKFILE"
if dotlockfile -r 0 -l -p "$ LOCKFILE "; then
  logit "Lock obtained at $ LOCKFILE  with dotlockfile"
  trap 'ECODE=$?; dotlockfile -u '"$ EVAL_SAFE_LOCKFILE "'; exit $ECODE' EXIT INT TERM
else
  logit "Could not obtain lock at $LOCKFILE; $0 likely already running."
  exit 0
fi
EXITCODE=0
cd "$INCOMINGDIR"
logit "Scanning queue directory..."
for HOST in *; do
    HOSTPATH="$INCOMINGDIR/$HOST"
    # files like backupsfmt2-134.13134_dest
    for FILE in "$HOSTPATH"/backupsfmt2-[0-9]*_?*; do
        if [ ! -f "$FILE" ]; then
            logit "Skipping non-existent $FILE"
            continue
        fi
        # Now, $DEST will be HOST/DEST.  Strip off the @ also.
        DEST=" echo "$FILE"   sed -e 's/^.*backupsfmt2[^_]*_//' -e 's,@,/,g' "
        if [ -z "$DEST" ]; then
            logerror "Malformed dest in $FILE"
            continue
        fi
        HOST2=" echo "$DEST"   sed 's,/.*,,g' "
        if [ -z "$HOST2" ]; then
            logerror "Malformed DEST $DEST in $FILE"
            continue
        fi
        if [ ! "$HOST" = "$HOST2" ]; then
            logerror "$DIR: $HOST doesn't match $HOST2"
            continue
        fi
        logit "Processing $FILE to $STORE/$DEST"
            if runcommand gpg -q -d < "$FILE"   runcommand zstdcat   runcommand zfs receive -u -o readonly=on "$STORE/$DEST"; then
                logit "Successfully processed $FILE to $STORE/$DEST"
                runcommand rm "$FILE"
        else
                logerror "FAILED to process $FILE to $STORE/$DEST"
                EXITCODE=15
        fi
Applying These Ideas to Non-ZFS Backups ZFS backups made our job easier in a lot of ways: Some of these benefits you just won't get without ZFS (or something similar like btrfs), but let's see how we could apply these ideas to non-ZFS backups. I will explore the implementation of them in a future post. When I say "non ZFS", I am being a bit vague as to whether the source, the destination, or both systems are running a non-ZFS filesystem. In general I'll assume that neither are ZFS. The first and most obvious answer is to just tar up the whole system and send that every day. This is, of course, only suitable for small datasets on a fast network. These tarballs could be unpacked on the destination and stored more efficiently via any number of methods (hardlink trees, a block-level deduplicator like borg or rdedup, or even just simply compressed tarballs). To make the network trip more efficient, something like rdiff or xdelta could be used. A signature file could be stored on the machine being backed up (generated via tee/pee at stream time), and the next run could simply send an rdiff delta over NNCP. This would be quite network-efficient, but still would require reading every byte of every file on every backup, and would also require quite a bit of temporary space on the receiving end (to apply the delta to the previous tarball and generate a new one). Alternatively, a program that generates incremental backup files such as rdup could be used. These could be transmitted over NNCP to the backup server, and unpacked there. While perhaps less efficient on the network -- every file with at least one modified byte would be retransmitted in its entirety -- it avoids the need to read every byte of unmodified files or to have enormous temporary space. I should note here that GNU tar claims to have an incremental mode, but it has a potential data loss bug. There are also some tools with algorithms that may apply well in this use care: syrep and fssync being the two most prominent examples, though rdedup (mentioned above) and the nascent asuran project may also be combinable with other tools to achieve this effect. I should, of course, conclude this section by mentioning btrfs. Every time I've tried it, I've run into serious bugs, and its status page indicates that only some of them have been resolved. I would not consider using it for something as important as backups. However, if you are comfortable with it, it is likely to be able to run in more constrained environments than ZFS and could probably be processed in much the same way as zfs streams.

30 December 2020

John Goerzen: Airgapped / Asynchronous Backups with ZFS over NNCP

In my previous articles in the series on asynchronous communication with the modern NNCP tool, I talked about its use for asynchronous, potentially airgapped, backups. The first article, How & Why To Use Airgapped Backups laid out the foundations for this. Now let s dig into the details. Today s post will cover ZFS, because it has a lot of features that make it very easy to support in this setup. Non-ZFS backups will be covered later. The setup is actually about as simple as it is for SSH, but since people are less familiar with this kind of communication, I m going to try to go into more detail here. Assumptions I am assuming a setup where: Hardware Let s start with hardware for the machine to hold the backups. I initially considered a Raspberry Pi 4 with 8GB of RAM. That would probably have been a suitable machine, at least for smaller backup sets. However, none of the Raspberry Pi machines support hardware AES encryption acceleration, and my Pi4 benchmarks as about 60MB/s for AES encryption. I want my backups to be encrypted, and decided this would just be too slow for my purposes. Again, if you don t need encrypted backups or don t care that much about performance may people probably fall into this category you can have a fully-functional Raspberry Pi 4 system for under $100 that would make a fantastic backup server. I wound up purchasing a Qotom-Q355G4 micro PC with a Core i5 for about $315. It has USB 3 ports and is designed as a rugged, long-lasting system. I have been using one of their older Celeron-based models as my router/firewall for a number of years now and it s been quite reliable. For backup storage, you can get a USB 3 external drive. My own preference is to get a USB 3 toaster (device that lets me plug in SATA drives) so that I have more control over the underlying medium and can save the expense and hassle of a bunch of power supplies. In a future post, I will discuss drive rotation so you always have an offline drive. Then, there is the question of transport to the backup machine. A simple solution would be to have a heavily-firewalled backup system that has no incoming ports open but makes occasional outgoing connections to one specific NNCP daemon on the spooling machine. However, for airgapped operation, it would also be very simple to use nncp-xfer to transport the data across on a USB stick or some such. You could set up automounting for a specific USB stick plug it in, all the spooled data is moved over, then plug it in to the backup system and it s processed, and any outbound email traffic or whatever is copied to the USB stick at that point too. The NNCP page has some more commentary about this kind of setup. Both are fairly easy to set up, and NNCP is designed to be transport-agnostic, so in this article I m going to focus on how to integrate ZFS with NNCP. Operating System Of course, it should be no surprise that I set this up on Debian. As an added step, I did all the configuration in Ansible stored in a local git repo. This adds a lot of work, but it means that it is trivial to periodically wipe and reinstall if any security issue is suspected. The git repo can be copied off to another system for storage and takes the system from freshly-installed to ready-to-use state. Security There is, of course, nothing preventing you from running NNCP as root. The zfs commands, obviously, need to be run as root. However, from a privilege separation standpoint, I have chosen to run everything relating to NNCP as a nncp user. NNCP already does encryption, but if you prefer to have zero knowledge of the data even to NNCP, it s trivial to add gpg to the pipeline as well, and in fact I ll be demonstrating that in a future post for other reasons. Software Besides NNCP, there needs to be a system that generates the zfs send streams. For this project, I looked at quite a few. Most were designed to inspect the list of snapshots on a remote end, compare it to a list on the local end, and calculate a difference from there. This, of course, won t work for this situation. I realized my own simplesnap project was very close to being able to do this. It already used an algorithm of using specially-named snapshots on the machine being backed up, so never needed any communication about what snapshots were present where. All it needed was a few more options to permit sending to a stream instead of zfs receive. I made those changes and they are available in simplesnap 2.0.0 or above. That version has also been uploaded to sid, and will work fine as-is on buster as well. Preparing NNCP I m going to assume three hosts in this setup: The basic NNCP workflow documentation covers the basic steps. You ll need to run nncp-cfgnew on each machine. This generates a basic configuration, along with public and private keys for that machine. You ll copy the public key sets to the configurations of the other machines as usual. On the laptop, you ll add a via line like this:
backupsvr:  
  id: ....
  exchpub: ...
  signpub: ...
  noisepub: ...
  via: ["spooler"]
This tells NNCP that data destined for backupsvr should always be sent via spooler first. You can then arrange for the nncp-daemon to run on the spooler, and nncp-caller or nncp-call on the backupsvr. Or, alternatively, airgapped between the two with nncp-xfer. Generating Backup Data Now, on the laptop, install simplesnap (2.0.0 or above). Although you won t be backing up to the local system, simplesnap still maintains a hostlock in ZFS. Prepate a dataset for it:
zfs create tank/simplesnap
zfs set org.complete.simplesnap:exclude=on tank/simplesnap
Then, create a script /usr/local/bin/runsimplesnap like this:
#!/bin/bash
set -e
simplesnap --store tank/simplesnap --setname backups --local --host  hostname  \
   --receivecmd /usr/local/bin/simplesnap-queue \
   --noreap
su nncp -c '/usr/local/nncp/bin/nncp-toss -noprogress -quiet'
if ip addr   grep -q 192.168.65.64; then
  su nncp -c '/usr/local/nncp/bin/nncp-call -noprogress -quiet -onlinedeadline 1 spooler'
fi
The call to simplesnap sets it up to send the data to simplesnap-queue, which we ll create in a moment. The receivmd, plus noreap, sets it up to run without ZFS on the local system. The call to nncp-toss will process any previously-received inbound NNCP packets, if there are any. Then, in this example, we do a very basic check to see if we re on the LAN (checking 192.168.65.64), and if so, will establish a connection to the spooler to transmit the data. If course, you could also do this over the Internet, with tor, or whatever, but in my case, I don t want to automatically do this in case I m tethered to mobile. I figure if I want to send backups in that case, I can fire up nncp-call myself. You can also use nncp-caller to set up automated connections on other schedules; there are a lot of options. Now, here s what /usr/local/bin/simplesnap-queue looks like:
#!/bin/bash
set -e
set -o pipefail
DEST=" echo $1   sed 's,^tank/simplesnap/,,' "
echo "Processing $DEST" >&2
# stdin piped to this
su nncp -c "/usr/local/nncp/bin/nncp-exec -nice B -noprogress backupsvr zfsreceive '$DEST'" >&2
echo "Queued for $DEST" >&2
This is a pretty simple script. simplesnap will call it with a path based on the store, with the hostname after; so, for instance, tank/simplesnap/laptop/root or some such. This script strips off the leading tank/simplesnap (which is a local fragment), leaving the host and dataset paths. Then it just pipes it to nncp-exec. -nice B classifies it as low-priority bulk data (so if you have some more important interactive data, it would be sent first), then passes it to whatever the backupsvr defines as zfsreceive. Receiving ZFS backups In the NNCP configuration on the recipient s side, in the laptop section, we define what command it s allowed to run as zfsreceive:
      exec:  
        zfsreceive: ["/usr/bin/sudo", "-H", "/usr/local/bin/nncp-zfs-receive"]
       
We authorize the nncp user to run this under sudo in /etc/sudoers.d/local nncp:
Defaults env_keep += "NNCP_SENDER"
nncp ALL=(root) NOPASSWD: /usr/local/bin/nncp-zfs-receive
The NNCP_SENDER is the public key ID of the sending node when nncp-toss processes the incoming data. We can use that for sanity checking later. Now, here s a basic nncp-zfs-receive script:
#!/bin/bash
set -e
set -o pipefail
STORE=backups/simplesnap
DEST="$1"
# now process stdin
runcommand zfs receive -o readonly=on -x mountpoint "$STORE/$DEST"
And there you have it all the basics are in place. Update 2020-12-30: An earlier version of this article had zfs receive -F instead of zfs receive -o readonly=on -x mountpoint . These changed arguments are more robust.
Update 2021-01-04: I am now recommending zfs receive -u -o readonly=on ; see my successor article for more. Enhancements You could enhance the nncp-zfs-receive script to improve logging and error handling. For instance:
#!/bin/bash
set -e
set -o pipefail
STORE=backups/simplesnap
# $1 will be the host/dataset
DEST="$1"
HOST=" echo "$1"   sed 's,/.*,,g' "
if [ -z "$HOST" ]; then
   echo "Malformed command line"
   exit 5
fi
# Log a message
logit ()  
   logger -p info -t " basename "$0" [$$]" "$1"
 
# Log an error message
logerror ()  
   logger -p err -t " basename "$0" [$$]" "$1"
 
# Log stdin with the given code.  Used normally to log stderr.
logstdin ()  
   logger -p info -t " basename "$0" [$$/$1]"
 
# Run command, logging stderr and exit code
runcommand ()  
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
 
exiterror ()  
   logerror "$1"
   echo "$1" 1>&2
   exit 10
 
# Sanity check
if [ "$HOST" = "laptop" ]; then
  if [ "$NNCP_SENDER" != "12345678" ]; then
    exiterror "Host $HOST doesn't match sender $NNCP_SENDER"
  fi
else
  exiterror "Unknown host $HOST"
fi
runcommand zfs receive -F "$STORE/$DEST"
Now you ll capture the ZFS receive output in syslog in a friendly way, so you can look back later why things failed if they did. Further notes on NNCP nncp-toss will examine the exit code from an invocation. If it is nonzero, it will keep the command (and associated stdin) in the queue and retry it on the next invocation. NNCP does not guarantee order of execution, so it is possible in some cases that ZFS streams may be received in the wrong order. That is fine here; zfs receive will exit with an error, and nncp-toss will just run it again after the dependent snapshots have been received. For non-ZFS backups, a simple sequence number can handle this issue.

27 December 2020

John Goerzen: See The World Through the Eyes of a Child, and You Are Free

Because we see things so often, we see them less and less. Those who live in thanksgiving daily, however, have a way of opening their eyes and seeing the wonders and beauties of this world as though seeing them for the first time. Joseph Wirthlin
Today is about dirt. I had to learn about it, again, from my 2-year-old last week. She and I were playing outside something we have more time to do right now. She started to roll around in the grass, and asked me to play in the grass, too. As I got close to the ground, I inhaled the wonderfully sweet and earthy scent of spring soil. I hadn t smelled that in a long time. What an unexpected gift. This photo is of a child, having a fantastic time with dirt and water. The more bits of Kansas he had on him, the more shrieks of laughter I heard. It think most adults keep forgetting the joys of simple things like dirt. I am lucky to have children around to remind me. This week, I also had the opportunity to teach my 2-year-old the joys of making big splashes in mud puddles, so maybe I can also remind them on occasion.
See the world as if for the first time; see it through the eyes of a child, and you will suddenly find that you are free. Deepak Chopra
Rural Kansas, 2016 (I originally wrote this on March 31, and am sharing it on my blog for the first time today.)

John Goerzen: Asynchronous Email: Exim over NNCP (or UUCP)

Following up to yesterday s article about how NNCP rehabilitates asynchronous communication with modern encryption and onion routing, here is the first of my posts showing how to put it into action. Email is a natural fit for async; in fact, much of early email was carried by UUCP. It is useful for an airgapped machine to be able to send back messages; errors from cron, results of handling incoming data, disk space alerts, etc. (Of course, this would apply to a non-airgapped machine also). The NNCP documentation already describes how to do this for Postfix. Here I will show how to do it for Exim. A quick detour to UUCP land When you encounter a system such as email that has instructions for doing something via UUCP, that should be an alert to you that here is some very relevant information for doing this same thing via NNCP. The syntax is different, but broadly, here s a table of similar NNCP commands:
Purpose UUCP NNCP
Connect to remote system uucico -s, uupoll nncp-call, nncp-caller
Receive connection (pipe, daemon, etc) uucico (-l or similar) nncp-daemon
Request remote execution, stdin piped in uux nncp-exec
Copy file to remote machine uucp nncp-file
Copy file from remote machine uucp nncp-freq
Process received requests uuxqt nncp-toss
Move outbound requests to dir (for USB stick, airgap, etc) N/A nncp-xfer
Create streaming package of outbound requests N/A nncp-bundle
If you used UUCP back in the day, you surely remember bang paths. I will not be using those here. NNCP handles routing itself, rather than making the MTA be aware of the network topology, so this simplifies things considerably. Sending from Exim to a smarthost One common use for async email is from a satellite system: one that doesn t receive mail, or have local mailboxes, but just needs to get email out to the Internet. This is a common situation even for conventionally-connected systems; in Exim speak, this is a satellite system that routes mail via a smarthost. That is, every outbound message goes to a specific target, which then is responsible for eventual delivery (over the Internet, LAN, whatever). This is fairly simple in Exim. We actually have two choices for how to do this: bsmtp or rmail mode. bsmtp (batch SMTP) is the more modern way, and is essentially a derivative of SMTP that explicitly can be queued asynchronously. Basically it s a set of SMTP commands that can be saved in a file. The alternative is rmail (which is just an alias for sendmail these days), where the data is piped to rmail/sendmail with the recipients given on the command line. Both can work with Exim and NNCP, but because we re doing shiny new things, we ll use bsmtp. These instructions are loosely based on the Using outgoing BSMTP with Exim HOWTO. Some of these may assume Debianness in the configuration, but should be easily enough extrapolated to other configs as well. First, configure Exim to use satellite mode with minimal DNS lookups (assuming that you may not have working DNS anyhow). Then, in the Exim primary router section for smarthost (router/200_exim4-config_primary in Debian split configurations), just change transport = remote_smtp_smarthost to transport = nncp. Now, define the NNCP transport. If you are on Debian, you might name this transports/40_exim4-config_local_nncp:
nncp:
  debug_print = "T: nncp transport for $local_part@$domain"
  driver = pipe
  user = nncp
  batch_max = 100
  use_bsmtp
  command = /usr/local/nncp/bin/nncp-exec -noprogress -quiet hostname_goes_here rsmtp
.ifdef REMOTE_SMTP_HEADERS_REWRITE
  headers_rewrite = REMOTE_SMTP_HEADERS_REWRITE
.endif
.ifdef REMOTE_SMTP_RETURN_PATH
  return_path = REMOTE_SMTP_RETURN_PATH
.endif
This is pretty straightforward. We pipe to nncp-exec, run it as the nncp user. nncp-exec sends it to a target node and runs whatever that node has called rsmtp (the command to receive bsmtp data). When the target node processes the request, it will run the configured command and pipe the data in to it. More complicated: Routing to various NNCP nodes Perhaps you would like to be able to send mail directly to various NNCP nodes. There are a lot of ways to do that. Fundamentally, you will need a setup similar to the UUCP example in Exim s manualroute manual, which lets you define how to reach various hosts via UUCP/NNCP. Perhaps you have a star topology (every NNCP node exchanges email with a central hub). In the NNCP world, you have two choices of how you do this. You could, at the Exim level, make the central hub the smarthost for all the side nodes, and let it redistribute mail. That would work, but requires decrypting messages at the hub to let Exim process. The other alternative is to configure NNCP to just send to the destinations via the central hub; that takes advantage of onion routing and doesn t require any Exim processing at the central hub at all. Receiving mail from NNCP On the receiving side, first you need to configure NNCP to authorize the execution of a mail program. In the section of your receiving host where you set the permissions for the client, include something like this:
      exec:  
        rsmtp: ["/usr/sbin/sendmail", "-bS"]
       
The -bS option is what tells Exim to receive BSMTP on stdin. Now, you need to tell Exim that nncp is a trusted user (able to set From headers arbitrarily). Assuming you are running NNCP as the nncp user, then add MAIN_TRUSTED_USERS = nncp to a file such as /etc/exim4/conf.d/main/01_exim4-config_local-nncp. That s it! Some hosts, of course, both send and receive mail via NNCP and will need configurations for both.

26 December 2020

John Goerzen: When You Think You re At the End, You re At the Beginning

Often when you think you re at the end of something, you re at the beginning of something else. Fred Rogers
This is sunrise over Kansas. Or maybe sunset. I m not going to tell you this time, because it doesn t matter all that much. I love that it (if you don t over-analyze it) could be either, and also that it looks like the land in the distance fades to a blue ocean. Is it sunrise or sunset? What is at the horizon? I don t think it really matters, in the presence of such natural beauty. Rural Kansas, 2015

John Goerzen: Rehabilitating Asynchronous Communication with NNCP: A Cross Between Tor, ssh, and UUCP

Have you ever been traveling, shot a ton of photos and videos, but were annoyed to find it was saturating the terrible wifi you had access to? Maybe you d wish the upload to pause until you get somewhere else, but then pausing syncing on your Nextcloud/Syncthing/Dropbox would also pause other syncing you didn t want to pause. Or you have trouble backing up your laptop when not at home, in a way that won t accidentaly eat up your cell phone data. There are ways to help with this: asynchronous transfer. Here s a lot of background. If you want to see how encrypted, onion-routed UUCP looks, skip ahead to the NNCP section! There is an old saying: When all you have is a hammer, every problem looks like a nail. We have this wonderful tool called ssh available, and it is pervasive and well-understood, so we tend to use it. But we ve missed out on some benefits of asynchronous processing that we actually used to have more frequently. Of course, we are all used to some asynchronous services in our lives. Email is a popular example: most mail clients work offline and will transmit stored messages when the mail server becomes reachable. Mail servers themselves work that way, too. Many instant messaging platforms do as well. Even some backup systems do. Bacula/Bareos, for instance, spools all backup data to disk on the system connected to the tape drive, and from there to the tape itself. They do this for several reasons, but primarily the fact that if tape drives are not fed with data at their design speed, it can cause physical damage to the tape or even the drive. It causes the drive to have to pause, and seek backwards to reposition for the next write. This creates excessive travel of the tape over the write heads, causing a condition known as tape shine where the tape is damaged prematurely. Here are some problems people often run into when sending data across a network (or the Internet) synchronously: Of course, there are plenty of situations where synchronous communication is a must. For instance: I suspect that the reason we don t do more asynchronous processing these days, despite it being strong in the Unix heritage, is the lack of modern tools to do it. Let s explore some more. Some of my use cases I run ZFS on all my systems that support it: file server, laptops, workstations, etc. It is only natural to use ZFS send/receive to do backups, and I do. However, when I am traveling, my laptop never gets backed up, because the backups are pulled from the backup system. Sure, there are ways around that; a VPN, for instance. But then we have the situation where sometimes I do not want to send the backup even if I have a working Internet connection: perhaps I m tethered to a mobile connection and it would be expensive to do so, or I m on hotel Wifi that is flaky and slow and I don t want to give up any of its meager bandwidth. I have another backup-related problem. I have a remote server, which until recently was using extremely slow disks. If I made significant changes, the backup would take the better part of a day. That s annoying when I try to back up hourly. So of course I had to implement locking, but then that means none of my other machines would back up that day either. Once I needed to transmit about 2TB of data. My home Internet connection was terribly slow, and I calculated it would take multiple months to do this. So I took to manually copying parts of the data to my laptop, and whenever I d find an airport or coffee shop with faster Internet than at home, I d send off those bits from it. But it took a ton of work. The bespoke asynchronous problem And that ton of work is perhaps why we aren t doing more of this. There s been no great standard solution, so it s all roll your own when you need to. So we just use ssh, because it s easier and usually good enough . But as I wrote in my recent article on airgapped backups, there are reasons to go async. Solutions Wouldn t it be great to be able to queue up data for a machine, and let it get there in whatever way it can? Maybe a fast Internet connection is found, or via Tor, or via copying to a USB stick, or via radio broadcast? It would make many of these scenarios a lot easier. And there are ways for this now, with modern security! We have some tools on Linux for this: git-annex for storage and migration, syrep for synchronization, and NNCP for file transfer and remote execution (could be combined with some of these other tools). Let s dive in to NNCP. NNCP If you already know UUCP, think of NNCP as UUCP brought into the modern era, with modern security and tools. Basically, NNCP permits you to send files to a remote system, request files from a remote system, and pipe data to an NNCP command that requests execution remotely. So you could, say, pipe a zfs send to NNCP which sends it to the remote and pipes it to zfs receive when it gets there. NNCP has a delay-tolerant, resumable protocol that can run over just about any reliable connection: TCP, serial, Tor, radios of various kinds, you name it. But that s not all; it also can dump its queue onto something like a USB stick for transport, or even make a tar-style stream that could be munged however you like. If you want to get fancy, you can assign priorities to data packets, so that, for instance, outbound email will always get sent before that 1TB file you ve got to send also. You can also configure it so that certain carriers handle certain priorities of data; your cell phone would only handle the most urgent, but a USB stick would take anything. NNCP is source-routed; you can tell it that the way that Bob reaches Alice is via Carl, then Betty. Bob can generate a message that will be sent along that route, fully encrypted and authenticated at each step of the way; Carl can t see the content of the message or even anything about it other than its next hop. How this helps Let s revisit some of my scnearios with NNCP. For the laptop being backed up, while traveling it can queue up its backups, or photos, or videos, or whatever. They could be triggered by a command when on a good connection, or automatically. The data could be copied to USB and given to a friend to transmit; perfectly safe due to encryption. Or it could all wait until arriving at home, safely out of your other syncing directories. The NNCP documentation has an example of this. For the server being backed up slowly, that s easily solved; the slow backup would simply be queued up, and transmitted and processed when it s ready. This wouldn t interrupt other backups. How about the 2TB transmission problem? That s also made a lot easier. A command could be run to fill up a USB stick with parts of the queue, then that USB stick plugged in and transmitted whenever at a fast location. Repeat as needed while the slow system continues its upload of the remaining bits. NNCP has a lot of interesting use cases documented as well. If you are already familiar with how public keys work in SSH, then NNCP should be immediately familiar as well. It is a similar concept (though arguably somewhat easier to set up). I am working on setting up a NNCP network, and will have more posts on how to do so once I ve got it going. In the meantime, the documentation for the project is also pretty good.

25 December 2020

John Goerzen: So Many Caring People In This World

When I was a boy and I would see scary things in the news, my mother would say to me, Look for the helpers. You will always find people who are helping. To this day, especially in times of disaster, I remember my mother s words and I am always comforted by realizing that there are still so many helpers so many caring people in this world. Fred Rogers
This photo doesn t have amazing lighting or fantastic composition. In fact, it looks ordinary. It s only when you know its story that the beauty shines through. In 2019, Fremont, NE had been cut off by flooding. People were trying to get in and out of the town, and couldn t. The only way in or out was from the small airport. About 50 pilots from Nebraska and Iowa (and me from Kansas) came over to help. Every plane you see here and more that I couldn t fit in the frame was at Millard Airport, Omaha, flying people and supplies into and out of Fremont. Estimates are that we flew over 1000 people and tons of supplies that weekend. All safely. I remember flying the family with a 1-week-old baby that had gone to Omaha for a doctor appointment and then couldn t get home for three days. I remember the elderly couple and their dog that I flew out of Fremont, the former Marine riding in the co-pilot seat next to me, cracking jokes with me as we went. I remember the group of ladies that were laughing as I gave them the required seat belt briefing, the mom with her kids, the man that had to get to work, and packed a few day s supplies in his backpack, unsure when he d be able to get back home. It was the first some had ever been in a plane. There were so many helpers. People all over Omaha brought supplies to Millard airport. Others had just shown up at both airports to help organize and make lists of passengers and match them up with different size planes. Someone at Millard had a trailer and golf cart for supplies. When I d land at Fremont, before I was even out of the plane, highschoolers had already swarmed it and were helping to unload supplies or help passengers out. One time when I got back to Millard, I found lunch: a pizza place had donated pizza for the pilots, and then a restaurant in Fremont did too. With so many extra planes in the sky, Omaha ATC was slammed and still did a fantastic job. I of course took no photos of the people I carried, but I have thought of them often in the last year. This photo is of Millard airport, Omaha, loading up supplies. Look at all these helpers. I think it s one of the most beautiful sights I ve ever seen. Today there are billions of helpers in this world. The obvious ones: the truck drivers, the health care workers, the grocery store workers. And also the non-obvious ones: all the people that are wearing masks, practicing social distancing, forgoing Christmas gatherings, for the good of others, despite the hardship and heartbreak it may cause. If you didn t know the story of this photo, you wouldn t know these planes were all helpers in time of disaster. We don t know the story of all the people we see in our world today, but chances are good that many of them are helpers also. When you know their story, the beauty shines through. May we all be able to see the beauty that still surrounds us this Christmas.

24 December 2020

John Goerzen: Joyful is the Dark

Joyful is the dark
coolness of the tomb,
waiting for the wonder
of the morning. Never was that midnight
touched by dread and gloom;
darkness was the cradle
of the dawning. Brian Wren
Most of us are not personally experiencing symptoms of a pandemic virus, but with all the changes around us, with all the worries within us, many of us have been touched by dread and gloom. I can find many layers of meaning in this poem. Today I think this poem excerpt reminds us to find the joy where we are, in the moment we are. Are we just waiting for the morning? Or do we take advantage of this pause to gaze up at the beautiful colors of the night? Are we defined by this moment, or do we define it? We are OK right now, so let this time be the cradle of something beautiful. If you haven t noticed the moon in this photo, zoom in. I don t see dread and gloom in this photo. I see color, and hope, and beauty. Rural Kansas, 2017

23 December 2020

John Goerzen: O, Sunlight!

O, Sunlight! The most precious gold to be found on Earth. Roman Payne
There is much beauty in this world, much hope, much life. All we need to do is pause, breathe, and take a moment to see it. It might be as simple as the gift of sunlight. I hope you all have moments of sunlight and delight each day. Marion County, KS, April 2013

John Goerzen: How & Why To Use Airgapped Backups

A good backup strategy needs to consider various threats to the integrity of data. For instance: It s that last one that is of particular interest today. A lot of backup strategies are such that if a user (or administrator) has their local account or network compromised, their backups could very well be destroyed as well. For instance, do you ssh from the account being backed up to the system holding the backups? Or rsync using a keypair stored on it? Or access S3 buckets, etc? It is trivially easy in many of these schemes to totally ruin cloud-based backups, or even some other schemes. rsync can be run with delete (and often is, to prune remotes), S3 buckets can be deleted, etc. And even if you try to lock down an over-network backup to be append-only, still there are vectors for attack (ssh credentials, OpenSSL bugs, etc). In this post, I try to explore how we can protect against them and still retain some modern conveniences. A backup scheme also needs to make a balance between: My story so far About 20 years ago, I had an Exabyte tape drive, with the amazing capacity of 7GB per tape! Eventually as disk prices fell, I had external disks plugged in to a server, and would periodically rotate them offsite. I ve also had various combinations of partial or complete offsite copies over the Internet as well. I have around 6TB of data to back up (after compression), a figure that is growing somewhat rapidly as I digitize some old family recordings and videos. Since I last wrote about backups 5 years ago, my scheme has been largely unchanged; at present I use ZFS for local and to-disk backups and borg for the copies over the Internet. Let s take a look at some options that could make this better. Tape The original airgapped backup. You back up to a tape, then you take the (fairly cheap) tape out of the drive and put in another one. In cost per GB, tape is probably the cheapest medium out there. But of course it has its drawbacks. Let s start with cost. To get a drive that can handle capacities of what I d be needing, at least LTO-6 (2.5TB per tape) would be needed, if not LTO-7 (6TB). New, these drives cost several thousand dollars, plus they need LVD SCSI or Fibre Channel cards. You re not going to be hanging one off a Raspberry Pi; these things need a real server with enterprise-style connectivity. If you re particularly lucky, you might find an LTO-6 drive for as low as $500 on eBay. Then there are tapes. A 10-pack of LTO-6 tapes runs more than $200, and provides a total capacity of 25TB sufficient for these needs (note that, of course, you need to have at least double the actual space of the data, to account for multiple full backups in a set). A 5-pack of LTO-7 tapes is a little more expensive, while providing more storage. So all-in, this is going to be in the best possible scenario nearly $1000, and possibly a lot more. For a large company with many TB of storage, the initial costs can be defrayed due to the cheaper media, but for a home user, not so much. Consider that 8TB hard drives can be found for $150 $200. A pair of them (for redundancy) would run $300-400, and then you have all the other benefits of disk (quicker access, etc.) Plus they can be driven by something as cheap as a Raspberry Pi. Fancier tape setups involve auto-changers, but then you re not really airgapped, are you? (If you leave all your tapes in the changer, they can generally be selected and overwritten, barring things like hardware WORM). As useful as tape is, for this project, it would simply be way more expensive than disk-based options. Fundamentals of disk-based airgapping The fundamental thing we need to address with disk-based airgapping is that the machines being backed up have no real-time contact with the backup storage system. This rules out most solutions out there, that want to sync by comparing local state with remote state. If one is willing to throw storage efficiency out the window maybe practical for very small data sets one could just send a full backup daily. But in reality, what is more likely needed is a way to store a local proxy for the remote state. Then a runner device (a USB stick, disk, etc) could be plugged into the network, filled with queued data, then plugged into the backup system to have the data dequeued and processed. Some may be tempted to short-circuit this and just plug external disks into a backup system. I ve done that for a long time. This is, however, a risk, because it makes those disks vulnerable to whatever may be attacking the local system (anything from lightning to ransomware). ZFS ZFS is, it should be no surprise, particularly well suited for this. zfs send/receive can send an incremental stream that represents a delta between two checkpoints (snapshots or bookmarks) on a filesystem. It can do this very efficiently, much more so than walking an entire filesystem tree. Additionally, with the recent addition of ZFS crypto to ZFS on Linux, the replication stream can optionally reflect the encrypted data. Yes, as long as you don t need to mount them, you can mostly work with ZFS datasets on an encrypted basis, and can directly tell zfs send to just send the encrypted data instead of the decrypted data. The downside of ZFS is the resource requirements at the destination, which in terms of RAM are higher than most of the older Raspberry Pi-style devices. Still, one could perhaps just save off zfs send streams and restore them later if need be, but that implies a periodic resend of a full stream, an inefficient operation. dedpulicating software such as borg could be used on those streams (though with less effectiveness if they re encrypted). Tar Perhaps surprisingly, tar in listed incremental mode can solve this problem for non-ZFS users. It will keep a local cache of the state of the filesystem as of the time of the last run of tar, and can generate new tarballs that reflect the changes since the previous run (even deletions). This can achieve a similar result to the ZFS send/receive, though in a much less elegant way. Bacula / Bareos Bacula (and its fork Bareos) both have support for a FIFO destination. Theoretically this could be used to queue of data for transfer to the airgapped machine. This support is very poorly documented in both and is rumored to have bitrotted, however. rdiff and xdelta rdiff and xdelta can be used as sort of a non-real-time rsync, at least on a per-file basis. Theoretically, one could generate a full backup (with tar, ZFS send, or whatever), take an rdiff signature, and send over the file while keeping the signature. On the next run, another full backup is piped into rdiff, and on the basis of the signature file of the old and the new data, it produces a binary patch that can be queued for the backup target to update its stored copy of the file. This leaves history preservation as an exercise to be undertaken on the backup target. It may not necessarily be easy and may not be efficient. rsync batches rsync can be used to compute a delta between two directory trees and express this as a single-file batch that can be processed by a remote rsync. Unfortunately this implies the sender must always keep an old tree around (barring a solution such as ZFS snapshots) in order to compute the delta, and of course it still implies the need for history processing on the remote. Getting the Data There OK, so you ve got an airgapped system, some sort of runner device for your sneakernet (USB stick, hard drive, etc). Now what? Obviously you could just copy data on the runner and move it back off at the backup target. But a tool like NNCP (sort of a modernized UUCP) offer a lot of help in automating the process, returning error reports, etc. NNCP can be used online over TCP, over reliable serial links, over ssh, with offline onion routing via intermediaries or directly, etc. Imagine having an airgapped machine at a different location you go to frequently (workplace, friend, etc). Before leaving, you put a USB stick in your pocket. When you get there, you pop it in. It s despooled and processed while you want, and return emails or whatever are queued up to be sent when you get back home. Not bad, eh? Future installment I m going to try some of these approaches and report back on my experiences in the next few weeks.

John Goerzen: Every Storm Runs Out Of Rain

Every storm runs out of rain. Maya Angelou
There are a lot of rain clouds in life these days. May we all remember that days like this one are behind us and also ahead of us. Every storm runs out of rain.
That was the start of a series of photos from my collection & quotes I shared with friends during the initial lockdown in spring. I ll be sharing some here. And here we all are, still dealing with this and it s more severe in a lot of ways. One of my colleagues won t be able to see his parents this Christmas for the first time in over 40 years. But this storm will run out rain. And look how the scene changed, in just a few minutes. This is coming!

16 December 2020

John Goerzen: Non-Creepy Technology Purchasing & Gifting Guides

This time of year, a lot of people are thinking of buying gadgets and phones as gifts. But there are a lot of tech companies that have unethical practices, from terrible working conditions in their factories to spying on their users. Here are some buying guides to help you find gadgets that are fun and not creepy. The Free Software Foundation s Ethical Tech Giving Guide is a fantastic resource from what s probably the pickiest organization out there when it comes to tech. Not only do they highlight good devices, they also explain why and why you should, for instance, avoid the iPhone (their history of silencing political activists and spying on users). The FSF also has a Guide to DRM-Free Living talks about books, video, audio, and software that respects your freedom by letting you make your own backups, move it to other devices, and continue to use your purchases even if you have no Internet or the company you bought them from goes bankrupt. This is a fantastic and HUGE resource; there are hundreds of organizations out there that provide content in a way that respects your rights and many of them do it for free, legally, as well. PrivacyTools has a fantastic series of guides on everything from email providers to operating systems, as well as links to a number of other guides. The DeGoogle wiki on Reddit (as well as the sidebar) has a lot of fantastic alternatives to things like Chromebooks, Chrome, Gmail, etc. Related resources Here are some resources for education (what the issues are) and information about what companies and products to avoid. In addition to the FSF s other fantastic resources above, they also have a list of proprietary malware. It lists things, practices, and companies to avoid, and talks about the reasons why. Their addictions page is particularly good and relevant to my recent post on the problems of the attention economy. The Surveillance Self-Defense site from the Electronic Frontier Foundation is a fantastic introduction into how corporate surveillance works and how to defend against it. Use with a grain of salt: Mozilla, the people behind Firefox, have a site called Privacy Not Included that rates products by how creepy they are. They focus more narrowly on privacy than the more expansive set of freedoms the FSF considers (privacy is one of a number of things the FSF looks at), and in some cases I would say Mozilla is too generous (eg, with the Amazon Kindle, a number of their data points are just incorrect.)

Next.

Previous.